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Abstract 

In this paper we propose an overview of the recent academic literature devoted 
to the applications of Hawkes processes in finance. Hawkes processes constitute 
a particular class of multivariate point processes that has become very popular in 
empirical high frequency finance this last decade. After a reminder of the main 
definitions and properties that characterize Hawkes processes, we review their main 
empirical applications to address many different problems in high frequency finance. 
Because of their great flexibility and versatility, we show that they have been suc¬ 
cessfully involved in issues as diverse as estimating the volatility at the level of 
transaction data, estimating the market stability, accounting for systemic risk con¬ 
tagion, devising optimal execution strategies or capturing the dynamics of the full 
order book. 


1 Introduction 

The availability of high frequency financial data during the last decades allowed the 
empirical finance to devise and calibrate models of market microstructure, aiming 
at accounting for the intraday market dynamics in its finest details. Until recently, 
there were only few continuous time models for the high frequency price variations. 
Most approaches relied on discrete time models that consist either in aggregating 
the dynamics on intervals of a regular time grid or in considering the succession of 
discrete time events like trades (these models are generally referred to as “trading 
time” or “business time” models). Hasbrouck [3^, Engle and Russel [55] in the 
nineties, were the first to advocate that the modeling of financial data at the trans¬ 
action level could be advantageously done within the framework of continuous time 
“point processes”. Since then, point processes applications to finance is an ongoing, 
very active topic in the econometric literature. We refer the reader to the recent 
review of Bowens and Hautsch |5] on this subject. 

The first type of point processes proposed in the context of market microstructure 
is the ACD model introduced by Engle and Russel [28]. This model and its variants 
remains, by far, the most used model in high frequency econometrics [9|. In this 
class of models, the process is defined by the means of its “hazard function” that 
specifies the conditional law of inter-event (or duration) intervals. However, point 
processes (or counting processes) can alternatively be represented by their “intensity 
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function” that represents the conditional probability density of the occurrence of an 
event in the immediate future (see e.g. |22) for a comprehensive textbook on point 
process mathematical properties). 

In a pioneering work, Bowsher [15] , recognized the flexibility and the advantages 
of using the class of multivariate counting processes that can be specified by a 
conditional intensity vector. More specifically, he introduced a bivariate Hawkes 
processes in order to model the joint dynamics of trades and mid-price changes of 
the NYSE. Hawkes processes is a class of multivariate point processes that were 
introduced in the seventies by A.G Hawkes [37l|38] notably to model the occurrence 
seismic events. They involve an intensity vector that is a simple linear function 
of past events (see e.g. [TUI [3] for other examples of dynamical intensity point 
processes). Hawkes models are becoming more and more popular in the domain of 
high frequency finance. This popularity can be explained above all by their great 
simplicity and flexibility, as anticipated by Bowsher m- These models can easily 
account for the interaction of various types of events, for the influence of some 
intensive factors (through marks) or for the existence of non-stationarities. They 
are amenable to statistical inference and closed-form formulae can be obtained in 
some particular situations. Moreover since their parameters have a straightforward 
interpretation (notably through the cluster representation), they lead to a quite 
simple interpretation of many aspects of the complex dynamics of modern electronic 
markets. 

In this paper, we propose a survey of recent academic studies using Hawkes 
processes in the context of finance. As we already explained, there are many such 
studies. In order to present them in relation one to each other, we had to group them 
by “themes”. Of course, the boundaries of these themes are unclear (e.g., some of 
the studies belong to several themes), so the choices we have made are unavoidably 
somewhat arbitrary. However we believe it helps capturing the “picture” of Hawkes 
models in finance. 

This survey is organized as follows. Sec. is devoted to the theory of Hawkes 
processes. It introduces the main definitions and the general properties that will be 
used all along the paper. The following Sections focus on the applications of Hawkes 
processes to finance. Sec. [^starts with the main univariate models that can be found 
in the literature. That includes market activity or risk models (e.g., I-dimensional 
market order flow models, extreme return models). Price models (mid-price or best 
limit price) are presented in Sec. whereas Sec. is devoted to impact models. 
In this Section, we do not only discuss the influence of market order flows on price 
moves but also the problems related to optimal execution. Models that involve more 
order flows are presented in Sec.|^ So-called level-I models (i.e., dealing “only” with 
the dynamics of the best limits) as well full order book models are discussed. Finally, 
various studies that did not clearly fit in any of the previous Sections are presented 
in Sec. (e.g. systemic risk models, high-dimensional models or news models). 
More materials can be found in Appendices. In Appendix]^ all the academic works 
that are discussed throughout our paper and which involves numerical experiments 
on financial data is listed in a single table. This table summarizes some essential 
characteristics of the models and data used in each work. Finally, two Appendices 
sum up the main results about simulation (App. E and estimation (App. E of 
Hawkes processes. 
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2 The Hawkes process 

As mentioned in the introduction, Hawkes processes are a class of multivariate point 
processes introduced by Hawkes in the early seventies iniiciE] that are characterized 
by a stochastic intensity vector. If Nt is a vector of counting processed at time 
t, then its intensity vector At is defined heuristically as (see e.g. [52] for a rigorous 
definition): 

At = lim A-^E [ Nt+A - Nt I J^t] (1) 

where the hltration J-t stands for the information available up to (but not including) 
time t. In the case of Hawkes processes, At is simply a linear function of past jumps 
of the process as specified thereafter. 


2.1 Definition 


We consider a D-vaiiate counting-process Nt = {A^t whose associated intensity 
vector is denoted as At = 

Definition 1 (Hawkes process). A Hawkes process is a counting-process Nt such 
that the intensity vector can he written as 

D 

= + dNl,^^^t-t'), ( 2 ) 

i=i '' 


where the quantity /i = *5 « vector of exogenous intensities, and $(t) = 

{4>^^ {t)}f’j^i is a matrix-valued kernel such that: 

• It is component-wise positive, i.e., > 0 for each I < i,j < D; 

• It is component-wise causal (ift < 0, = 0 for each 1 < i, j < Dh 

• Each component {t) belongs to the space of -integrable function^ 


Notation 1 (Convolution). 
Eq. ^ as 


We can adopt a more compact notation so to rewrite 


At = /r -I- $ * dIVt, 


( 3 ) 


by defining the * operation, corresponding to a matrix multiplication in which ordi¬ 
nary products are replaced by convolutions. 

Notation 2 (Event times). By introducing the couples {{tm,km)}m=i ! where tm 
denotes the time of event number m and km € [Ij-'-iH] indicates its component, 
Eq. 0 can also be rewritten as 


M 

XI = -\- ^ ^ 

m—1 


( 4 ) 


Fig .[^shows as an example a specific realization of a multivariate Hawkes process. 

Even though the process defined by Eq. is well-defined by any choice of kernel 
$(t) satisfying the three conditions stated above, the stationary case characterized 
below is of particular relevance in most of the applications of Hawkes processes to 
finance (see Sec. 2.2 for some notable exceptions). 


counting process is a stochastic process {Nt}t>o, with values that are positive, integer, and 
increasing. By convention No = 0. 

^Strictly speaking, this definition only calls for {t) £ but this is of no interest for this paper. 
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Figure 1: A realization of a multivariate Hawkes process. The dots represent 
individual events, while the different rows refer to different i coordinates. 


Proposition 1 (Stationarity). The process Nt has asymptotically stationary incre¬ 
ments and Xt is asymptotically stationary if the kernel satisfies the assumption, also 
referred to as the stability condition; 

(H) The matrix ||<i>|| = {ll'/'*-’||}f’j=i has spectral radius smaller than 1. 

Notation 3 (Spectral radius). Here and in the following parts of the discussion, 
given a a scalar function f(t) we denote with the symbol ||/|| its L^-norm, defined 
as f dt\f{t)\. For a matrix F = {P^}, the notation ||F|| will denote its spectral 
radius. Finally, for a matrix of functions F{t) = {/*■’we will write HFH = 

From now on, we will always consider (unless specified) that assumption (H) 
holds and that the Hawkes processes are in the asymptotically stationary regime. In 
particular, the averages taken in the stationary state will be denoted with E [... ], 
while the variances will be written as V [... ]. The consequences of the stationarity 
assumption (H) will be fully explored in the next subsection. Here, we will first 
provide a simple implementation of a Hawkes process, whose prototypical version is 
the one in which the kernel functions (t) are exponential functions. 

Example 1 (Exponential kernel). Consider a bivariate Hawkes process in which 
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the kernel matrix has the for 




where the kernel components have the exponential form 


(5) 

( 6 ) 


where lx is an indicator function equal to 1 if x is true and zero otherwise. The 
above functions are -integrable, so that the choice > 0 ensures that 

the associated Hawkes process is well-defined. For the process to be stable, one needs 
to additionally require that the spectral norm satisfies ||$|| = + ||</'^'^^|| < 1- 

Due to = /g°° the stability condition becomes 


< 1 . 


(7) 


The parameters can be interpreted as the ones setting the overall strength 

of the interactions, while the control the relaxation time of the perturbations 

induced from past to future events. 


The Hawkes process with exponential kernels has several advantageous proper¬ 
ties, as it allows one to compute the expected value of arbitrary functions of Nf 
(see Sec. 2.3.4 and Ref. [in])j to be directly simulated (see App. n. or to compute 
efficiently its likelihood (see App. C.l). Most of these properties descend from a 
Markov property which in its simplest form is stated as follows: 


Proposition 2 (Markov property for exponential kernels). Consider a Hawkes pro¬ 
cess with exponential kernels (jT^{t) = a*t/3e^‘ltgR+ . Then the couple {Nt,Xt) is a 
Markov process. In particular, Eq. & for the intensity At can be recast in Markovian 
form as 

dAt = -/3At dt -\- a/3 dA^t (8) 

This property can be extended to the case in which (i) the coefficients /3*t are 
non-constant across components and (ii) the kernel <i> contains a finite sum of ex¬ 
ponentials. The price to pay in this more general setting is the introduction of an 
extra set of A auxiliary processes {At“^}a=i, suitably chosen so that the resulting 
{A -\- l)-uple {Nt, A(^\ ..., A^^^) is Markovian. 

In the non-exponential case, the Hawkes process cannot be generally mapped to a 
Markovian process, implying that it is necessary to take track of all its past history 
in order to perform exact simulation and estimation. A particularly well-known 
example of a non-exponential kernel is the power-law one, proposed in Ref. |59j in 
order to describe temporal clusters of seismic activity. 


Example 2 (Power-law kernel). Let’s now consider the case D = 1, and assume 
the kernel to be parameterized by the regularized power-law 


m 


a/3 

(TTM^ 


ltGH+ 


(9) 


^The upperscript (s) (resp. (c)) stands for the word self (resp. cross), since it describes the self- 
(resp. cross-) excitation of the two components. 
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Also in this case the process is well-defined for a, /? > 0. The stationarity condition 
in is met for 

a <7, (10) 

indicating that the tail exponent 70/0 power law-kernel should be positive for the 
increments of the process to be stationary. 

2.2 Some extensions 

Although the model defined in above section is the one originally introduced in [371 
I3H] . and most widely used in the literature, several generalizations have been pro¬ 
posed since. 


2.2.1 Marked Hawkes processes 

Eq. Q defining the Hawkes process can be enriched by endowing each event with 
a mark variable, thus obtaining a sequence of event times, components and marks 
{(tm, fcm, ^m)}m=i' One may further assumes events labeled with different marks 
to have different effects on the future intensities, leading to a dynamics for Xt of the 
type 

M 

A) (11) 

m—1 


Finally, one needs to introduce a generating mechanism for the marks, which are 
typically assumed to be i.i.d. random variables drawn with each event and sampled 
from a common distribution p{f). A typical choice for the interaction kernel is the 
one (/)b(t, ^) = (/>*'^(t)x*'^ (^)) in which one assumes a factorized form for the effect 
of the marks. This mechanism is used to describe events of different weights, and 
has been originally employed in order to model the occurrence of earthquakes of 
difference magnitudes [59]. In finance, marks can be used in order to model trades 
performed at times t^ with different volumes (ns e.g. in [Ml [5], see Sec. and 
or a drawdown intensity (as e.g. in H?] [IS], see Sec. |3.1| ). On a more general 
ground, note that multivariate Hawkes processes can also be seen as an example of 
Hawkes processes with interacting marks m- 


2 . 2.2 Exogenous non-stationarity 

The exogenous intensity fi can be generalized to a deterministic function of time 
/i(t). This choice allows to model a non-stationary system in which the interaction 
kernel is indeed independent of time. In finance, this is the case when one wants 
to model intra-day seasonalities (see the review article |9]) and/or spillover effects 
within successive days (as in m)- 


2 . 2.3 Endogenous non-stationarity 

The scenarios ||$|| > 1 and ||<I>|| = I have also been considered in order to model 
non-stationarity induced by endogenous interactions. These two cases present in¬ 
deed an important difference: while in the former one the average intensity grows 
exponentially in time, in the latter one the process may possess a finite average event 
rate. This second type of non-stationarity, which we will call quasi-stationarity^ has 
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raised a strong interest in the literature due to the fact that the condition ||<i>|| « 1 
is often met when calibrating Hawkes processes to real financial data (see the de¬ 
tailed discussion in Sec. |^. The limiting behavior of a Hawkes process in the regime 
11 $ 11 = 1 has been analyzed by Bremaud and Massoulie in where it is shown in 
particular that: 

Proposition 3 (Degeneracy of critical, short-range Hawkes). Let Nt be a univariate 
Hawkes process as in Eq. ^ such that ||<i>|| = 1 and fj, = 0. Then if 

dt f $(f) < oo ( 12 ) 

the average of the conditional intensity is either 0 or -l-oo. 

Hence in the D = 1 case, short-ranged kernels always lead to trivial processes. 
The next result shows instead that interactions of broader range allow instead a 
richer behavior. 

Theorem 1 (Existence of critical, stationarity). Let’s now consider a Hawkes pro¬ 
cess with ||$|| = 1 , pL = Q and 



supt^+^<^>{t) < R 

i>0 

lim = r 


(13) 

(14) 


where r,R > 0 and 7 G ]0, l/2[. Then the average intensity of such process is finite. 

Summarizing, there exist non-trivial univariate Hawkes processes with ||$|| = 1 
only for specific values of the tail exponent of ^(t), which is required to lie in the 
interval 7 S ] 0 , l/ 2 [. 

Notice that, even though in D = 1 a quasi-stationary short-ranged Hawkes 
process is always degenerate. Ref. [32] describes a scaling regime for Nt in which it is 
possible to obtain to a non-degenerate process in the limit ||<i)|| —>■ 1 by appropriately 
choosing an observation timescale for the process. This behavior will be reviewed in 

Sec. 11321 

In the multivariate setting. Ref. m shows that even in presence of kernels sat¬ 
isfying the short-range condition Eq. (12) it is possible to define a non-trivial quasi- 
stationary Hawkes process in the large-dimensional limit. In particular Ref. m 
assumes a factorized form of the kernel, of the type 




(15) 


with 11/I I = 1, and consider the D —>• 00 limit of the process Nf Such a limiting 
regime turns out to be well-dehned also when ||<I>|| = ||q:|| - > 1 , provided that 

D—>^oo 

the matrix a has a sufficiently low density of eigenvalues in the vicinity of the 
critical point ||q:|| = 1. Hence, a non-degenerate quasi-stationary limit for a Hawkes 
process can also be obtained as an effect of the interaction among a large number 
of components. 
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2.2.4 Non-linear Hawkes 


Non-linear generalizations of the Hawkes process have been considered by several 
authors [HI [Ml Ell ES] . The intensity function in the non-linear case is written as 


K = h 



(16) 


where h{-) is a non-linear function with support in K’*'. Typical choices for h include 
h[x) = la;gR+ and h{x) = e^. Note that the stability condition for various functions 
h was studied by Bremaud and Massoulie in m- The main advantage introduced 
by this extension is the possibility of modeling inhibition through negative valued 
kernels, although the price that has to be paid is the loss of mathematical tractability 
for most of the properties of the process. Yet, simulation and calibration of the model 
are possible even in this scenario [Mli- As we will see, negative valued kernels are 
found in the context of finance (see for instance Sec. 6.11. 

Let us end this section by mentioning that we have only referred to the most 
common extensions of Hawkes original model but many further generalizations have 
been proposed like e.g., mixed diffusion-Hawkes models [Ml IIE]i Hawkes models 
with shot noise exogenous events [M], Hawkes processes with generation dependent 
kernels [Ml- 


2.3 Properties 

The linear structure of the stochastic intensity At of a Hawkes process allows us to 
characterize many of its properties in a completely analytical manner. Notably, its 
first- and second-order properties are particularly easy to compute, while a cluster 
representation of the process can be used in order to obtain a useful characterization 
of the Hawkes process. These properties are reviewed in the following Section. 


2.3.1 First and second order properties 

Assuming the hypothesis (H), it is then possible to write explicitly the first- and 
the second-order properties of the model in term of the Laplace transform of the 
kernel 4>. In order to do this, it is necessary to introduce the function 'k defined as 
follows: 

Definition 2 (Kernel inversion). Consider a Hawkes process Nt with stationary 
increments. We define 4'(t) as the causal solution of the equation 

d>(t)-f 4'(t) * $(t) = 4'(t) (17) 

As a consequence o/(H), 4'(t) exists and can be expressed as the infinite convolution 

'l'(t) = $(t) -I- $(t) * <i>(t) -I- <i>(t) * 4>(t) * <i>(t) -I- ... (18) 

The matrix function 'k can be characterized analytically in term of the Laplace 
transform of the kernel $, which we define as follows: 

Notation 4 (Laplace transform). Given a scalar function f{t) G L^{—oo,+oo), we 
denote its Laplace transform as 

/ OO 

dtf{t)e^* (19) 

-OO 
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Vector and matrix Laplace transforms are defined by applying the above transforma¬ 
tion component-wise. 

In the Laplace domain, Eq. ( [T8| is then mapped to the algebraic relation 

^(z) = (I-l>(z))-i-I, (20) 


where I denotes the identity matrix, allowing us to state the main result concerning 
the linear properties of a Hawkes process: 

Proposition 4 (First- and second-order statistics). For a Hawkes process Nt with 
stationary increments, the following propositions hold: 

1. The average intensity A = E [dA’tJ/dt is equal to 

A = {l + i'{0))fi (21) 


2. The Laplace transform of the linear correlation matrix 


c{t — t') 


E [ diVtdiVtT ] - E [ dA^t ]E [ ] 

dMt' 


( 22 ) 


is equal to 

c(z) = (I + 4'(-z))E(I + ^^(z)), (23) 

where T, is a diagonal matrix with non-zero elements equal to E** = A®. 


This useful characterization of the linear properties of a Hawkes process, first 
formulated in iszi and then fully generalized in [5], allows to (i) obtain the 
linear predictions of a Hawkes model given p and $ as an input, (ii) calibrate non- 
parametrically the kernel $ from empirical data by inverting relations (211 and (231 
(see App. C.l). 

Let us point out that, in Ref. [4], the authors have established that under general 
conditions, the empirical covariation of a multivariate Hawkes process converges 
towards its expected value, that can be easily expressed in terms of the Hawkes 
covariance matrix c{t) as given by Eq. (23). 


Example 3 (Exponential kernel). Let’s consider again the bivariate case described 
by Eq. in the stationary case 1 > -I- . In that case, the Laplace transform 

of the kernel $(z) reads: 


1/1 1 W 
’ 2^1-17^ 0 (^(®)(z) 


1 

1 


1 

-1 


(24) 

implying that the kernel matrix is diagonal in the symmetric and antisymmetric 
combinations ± Nf). The Laplace transform of the individual 

components of the kernel is 


4>(z) 


alHc) 

1 — zf ’ 


(25) 


and due to the diagonal form of Eq. the inverse kernel ^(z) can be computed 
straightforwardly. If one assumes p. = {p,o,p,o), then the relations above imply that 
A = (Aq, Aq), with 


Ao = 


Mo 

1 — Qfl®) — 


(26) 


9 









The Laplace transforms of the lagged cross-correlations of the symmetric and anti¬ 
symmetric combinations X±{t), defined as 


c^{t — t') 


E [ {dNf ± dAf2,)(diVi ± d7V2) ] 
2dtdt' 


- 


(27) 


result 


c±(z) 


_ ^^0 _ 

(1 — z) =F 0))(1 — =F 


(28) 


Above expression can be inverted explicitly so to obtain the lagged cross-correlations 
in real space. In the simpler case = /?o, one obtains for example 


(t) = Ao 



2 (1 - a(®) =F a(^)) 


(29) 


Note that a singular component arises for t = 0 due to the assumption of unitary 
jumps (dN)'^ = dN. Above formula also shows that by moving the spectral norm 
||$|| = + 0 ;*-°^ close to the instability point ||<i>|| = 1, the decay of the symmetric 

mode of correlation function becomes slower and slower. Fig. illustrates the result 
above for the autocorrelation function of the modes . 


Auto-correlation 


Normalized variance 




Figure 2: Autocorrelation function (left) and normalized variance (right) of the 
combinations diagonalizing the interaction kernel appearing in Eq. (Im. We 
have used the parameter set = 0, = 0.1, /tq = /3o = 1 in order to simulate 

a single realization of the process of length T = 10^. For such a value of T, the 
theoretical predictions (dashed lines) are almost exactly superimposed to the results 
of the simulations. The 6ft) component in the cross-covariance function has been 
omitted in the left panel for the sake of clarity. 


Example 4 (Power-law kernel). Let’s now consider again the case of a power-law 
interaction kernel in dimension D = 1. For a kernel parameterized by Eq. the 
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Laplace transform reads 


^(z) = ae ^/'^(-zl/3yr(-j,-zf/3), 


(30) 


where T{n,z) is the incomplete Gamma function. The inverse kernel 4'(i) may be 
expressed in the Laplace domain as 


ae--//^(-z//3)7r(-7,-^//?) 

1 — ae~ ^/l^{-zlP)'iT{-'y,-zlP) ■ 


(31) 


As shown in Sec. 


2.1 


in this case the spectral radius ||$|| is equal to ||$|| = |'&(0)| = 
a/7, so that the model is stable for a < 7. Under this assumption, the average 
intensity results 

7 


A = fi 


7 — a 


(32) 


For a fixed value of the exogenous intensity p,, this relation interpolates between a 
total intensity equal to the exogenous one (in the non-interacting case a = 0/ and 
an increasingly larger number of events as soon as a approaches the instability point 
a = 7. 

The Laplace transform of the lagged cross-correlation matrix results 


(l-l>(-z))(l-$(z)) ■ 


(33) 


The above expression cannot be inverted analytically. One can indeed relate the 
tail behavior of the correlations to the small z behavior of c{z) thanks to Tauberian 
Theorems. In particular, for pt^l one has: 


r e (7 i)/ 3 t 7 > 1 

I iPt)-'^-'^ for 7 < 1 


(34) 


This behavior is illustrated in Fig. where we compare the autocorrelation function 
of several univariate Hawkes processes with power-law kernel and different tail ex¬ 
ponents. As a final note, we remark that for 7 < 1/2, and close to the instability 
point a = 7, the function c(t) obeys an intermediate asymptotics c(t) ^ which 

holds as long as 


fit < 


/ r(i-7) 
V 7 /a - 1 


1/7 


(35) 


In this particular regime, the Hawkes process develops an apparent Hurst exponent 
H = 1/2 + 7. This limiting behavior is a consequence of the quasi-stationarity 
condition ||<i)|| = 1, analyzed in Ref. and reviewed in Sec. \2.2.^ 


2.3.2 Characterization through second order statistics 

This section justifies why the Hawkes process can be thought of as the simplest 
example of an interacting point process. In fact, it is entirely characterized by its 
first- and second-order properties: means and correlations uniquely determine a 
Hawkes process through the solution of a Wiener-Hopf system. 
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Figure 3: Comparison of the autocorrelation function for a set of univariate Hawkes 
processes with power-law kernels parametrized by Eq. We used the values 

/r = /? = 1, a = 0 . 97 , and simulated processes of length T = 5 • 10® (for 7 = 0.25 
and 0.5), T = 10® (for 7 = 1) and T = 5-10® (for 7 = 2) in order to obtain the curves 
represented in the plot. The image illustrates the crossover from the exponential 
decay of correlation obtained for /3 > 1 to the power-law behavior detected for 
/3< 1. 


Let us start by defining the conditional intensity matrix g{t), that we define for 
t > 0 as 

E 




dNl diV^ = 1 


dt 


- A* , Vt > 0. 


Then one can prove straightforwardly [5] from Eq. (23) that 

c{t) = Yig^{t) , Vt > 0 


(36) 

(37) 


which relates conditional averages and lagged cross-correlations. 
Eq. (37) and (|^, one can prove that [5]: 


Hence by using 


Theorem 2 (Wiener-Hopf equation). Consider a Hawkes process defined by Eq. 0 
satisfying the stationarity assumption (H). Then the matrix function x{t) = d>(t) is 
the unique solution of the Wiener-Hopf system 


ait) = x{t) + x{t) * ait) vt > 0 (38) 

such that the components X^^ it) o,re causal and X*-^ (t) S Vqj. 

This property implies that, when fixing an average intensity vector A and a 
conditional expectation git)., there exist at most one Hawkes process consistent with 
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these observables. Indeed, such a process is not always guaranteed to exist, as a 
Hawkes process doesn’t necessarily reproduce the linear properties for systems in 
which inhibition is relevant. 

This result is the inverse one with respect to the one expressed by Eq. (23): 
while that equation expresses the fact that by fixing a kernel $(t) and an exogenous 
intensity /i the correlations are uniquely determined, the theorem above states that 
correlations and average intensities uniquely fix the interactions. While the direct 
result was first proved in [37], the converse was shown in |8] by using the Wiener- 
Hopf factorization technique. 

Finally note that the Wiener-Hopf system (38) is very useful in applications to 
empirical data, as it allows us to estimate non-parametrically the interaction kernel 
of a Hawkes process given a set of empirical observations (see App. C.2). 


2.3.3 Auto-regressive projection 

A Hawkes process with stationary increments can always be linearly approximated 
by suitably defined auto-regressive processes. In particular one can show that [51] : 

Proposition 5 (Auto-regressive projection). Consider a Hawkes proeess Nt defined 
by Eq. Q) satisfying the stationarity assumption (H). Then the convolution 
defined by 

= (M(t) -f 'fit)) *{yit + , (39) 

where t['(t) is the infinite convolution of the Hawkes kernel given by Eq. (Id] ), and 
Wt is a standard D-dimensional Brownian motion, satisfies 

c(^«)(t) = c{t), (41) 

where A and c denote respectively the average intensity and the lagged cross-correlation 
matrix of Nt. 

Hence, it is always possible to match the first and the second order properties 
of a stationary Hawkes process with interaction kernel $ by using a convolution of 
Wiener processes. On the other hand, higher order moments cannot be matched in 
the same way. 


2.3.4 Beyond second-order 

The first- and second-order moments are not the only moments which can be com¬ 
puted analytically for a Hawkes process. In particular, Jovanovic et al. [44j have 
recently developed a combinatorial procedure allowing to calculate cumulants (and 
consequently, moments) of arbitrary order of a Hawkes process. More precisely, 
given a set of components S G {!,..., D} and one of times ts = {ti,... t|s|}, it is 
possible to define a cumulant density of a Hawkes process as 

=dt-l«l^(K|-l!)(-l)H-i n 

TT Ben \ieB / 

where the sum runs over all the partitions tt of S', |7r| denotes their number of 
blocks and B labels individually the blocks of tt. Above equation generalizes the 
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definition of the average intensity, recovered for jS”! = 1, and of the lagged cross¬ 
correlation function, obtained for jS”! = 2. Moment densities can be obtained from 
above expression by writing 


n divA dt-""=^ n ^ 


(43) 


\ieS 


TT BGtT 


Ref. [44] shows how to express Eq. (42) as a sum of integral terms, which can be 


written explicitly in terms of fj, and 'l'(t). As each of such addends can be interpreted 
as a topologically distinct rooted tree with jS”! labeled leaves, the enumeration of all 


the contribution to Eq. (42) can be performed systematically. 


In the special case of a Hawkes processes with exponential kernel, a useful result 
is obtained by Errais et al. [55] by exploiting Dynkin’s formula for the couple (A^, A) 
in the marked framework described in Sec. |2.2.1[ In particular, they are able to 
express the generating function for the couple (A^, A) in terms of the solution of 
an ordinary differential equation. While it may be necessary to solve numerically 
the equation for the generating function, closed-form expressions are available for 
specific moments of Nt (see also the work of Dassios and Zhao [53] for similar results 
on a slight generalization of Hawkes processes). 


2.3.5 Martingale representation 

The process defined by Eq. ([^ admits a convenient martingale representation once 
one introduces suitably defined compensators fg ds A® [22| . 

Theorem 3 (Martingale representation). Given a Hawkes process the D stochas- 
tie processes 


Y, = W - 


ds A, 


(44) 


are martingales with respect to the canonieal filtration of the process Nt ]2‘^ . Ad¬ 
ditionally, under the stability condition (H), the stochastic intensity A* admits the 
representation 


Xt = p,-\- f {t — s)fj, dsf 4'(t —s)d4)j- 
Jo Jo 


(45) 


While the above result is valid even in the non-stationary regime, Eq. (45) takes a 


particularly simple form in the asymptotic regime of large t, where it can be written 
as 


A* 


A[ d/{t — s) dYs 

" Jo 


(46) 


thanks to Eq. (211. The martingale property of the Hawkes process plays an impor¬ 


tant role in determining the first- and second-order properties discussed in above 


section, which are derived by means of Eq. (46) in HIH]. The representation Eq. ( |45| ) 
is also particularly useful in order perform predictions of the intensity of the Hawkes 
process given an historic filtration Ht of the process. 

Example 5 (Prediction of the intensity). Suppose that, given a Hawkes process 
whieh satisfies the stationary condition (H) , one is interested in computing the pre¬ 
dictor E [At |j^s] for t > s. (see Ref. for direct applications in Finance). 
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While, by naively applying the definition of the Hawkes processes, one can obtain an 
implicit equation of the type 


E [ A( I =/i + f ^(t — u)dNu + f dw — u)E [ An IJ^s] 

Jo Js 

the martingale representation Eq. (^5) can be used in order to write the 
expression 

E [ At I J's] =/X + f {t - s)p. ds + f 'i>{t-s)dYs. 

Jo Jo 


(47) 

explicit 


(48) 


2.3.6 Scaling limit of the process 

The structure of Hawkes processes is naturally adapted to describe systems in which 
the discrete nature of the jumps in the coordinates Nt is relevant, making this model 
especially suitable for modeling high-frequency data. Indeed, in many applications 
one additionally needs to control the limiting behavior the system in the opposite 
regime of low frequencies, where the granularity of events is disregarded, and the 
scaling in time of Nt needs to be known. 


Diffusion towards a Brownian motion. The following results, first proved 
in [4] allow us to establish that Hawkes processes, under appropriate hypotheses and 
after a suitable rescaling, behave at large times as linear combinations of Wiener 
processes. 

Theorem 4 (Law of large numbers). Consider a Hawkes process as in Eq. ^ 
satisfying the stationarity assumption (H). Then 

sup ||T-ifV„T - uA|| -^ 0 (49) 

uG[0,l] T^oo 


almost surely and in Lf-norm. 

Above result establishes a law of large numbers for the Hawkes process, valid 
for any stationary kernel. Indeed, under additional hypotheses it is also possible to 
formulate a corresponding functional central-limit theorem. 

Theorem 5 (Central-limit theorem). Suppose that for all i,j < D the kernel $(t) 
satisfies 

dt < oo . (50) 

Then for u S [0,1] one has the following convergence is in law for the Skorokhod 
topology: 

tC^t-^Nut-uA) -^ (51) 

T—s-oo 

where Wt denotes a standard D-dimensional Brownian motion. 



Example 6 


in Sec. 2.1 


(Exponential kernel). Consider the bivariate Hawkes process analyzed 
Eqs. and (dip above imply that 


Nut - AquT f ^ 

T—>^oo Y 1 y 

1 - 


(Aor)i/2 / 1 - 
(I-aC))2-(aC))2 aC) 


** u 

w'^ 

** u 


( 52 ) 
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which in term of the combinations = 2 zb reads 


N- 


uT 


T^q 


>{l±lf^AouT+ ^ (Aor)V^ 

^ ^ 1 _ Q,(s) ^ Q,(< 




(53) 


This behavior is summarized in Fig. where we compare the rescaled processes 
at different timescales T. 


± 

uT 





0 0.25 0.5 0.75 1 

u 


Figure 4: The figure illustrates the shape of the process N^rp at different timescales 


T = 10,100,1000 for the same choice of parameters as in Fig. The process has 
been rescaled according to Eq. (53) so to obtain a standard Wiener process 1F„ in 
the limit T —>■ oo. The plot illustrates how the presence of discrete jumps, clarly 
visible at small times, becomes irrelevant in the scaling limit T —>■ oo. 


Other scaling limits It is well known that many of the microscopic observables 
involved in the price formation process do not diffuse at large scales (e.g., at the 
daily, or even monthly, time-scale) toward Brownian motion dynamics. For instance, 
the trading activity (i.e., the market-order flow) seems to be long-range dependent 
[H, while the volatility displays clusters that can last for months. Since, as we will 
see in the next sections, Hawkes processes tend to mimic very well the high-frequency 
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dynamics of financial time-series, it is natural to try to understand if diffusive limits 
other than the one described by Theorem can be found. In a recent work by 
Jaisson and Rosenbaum (42], a sequence of rescaled univariate Hawkes processes 

zf • = (54) 

indexed by a time-scale parameter T and where ax G [0,1[ is proved to converge, 
when T —)• -Poo, towards an integrated Cox, Ingersoll, Ross process [20] under the 
main following conditions]^ 

• the corresponding sequence of kernels satisfies where 

(f) is differentiable and it is such that ||(^|| = (^( 0 ) = 1 , ^'{0) < -|-oo, ||(/)'||oo < 
- 1-00 and ||0'|| < -l-oo, 

• the criticality condition (H) of Prop. is met at a speed 

lim (1 — aT)T = k, k > 0. (55) 

T^+oo 

Hence, even though a quasi-stationary short-ranged Hawkes process is always de¬ 
generate, one can detect a non-trivial behavior of the rescaled counting function by 
suitably choosing an observation timescale T ~ (1—Let us point out that 
the above conditions do not allow for </> to have a power-law decay with an exponent 
strictly smaller than 2 (in the limit t -l-oo). As reviewed in Sec.]^ several studies 
tend to show that for many financial time-series (e.g., market order flow time-series) 
the relevant kernel has power-law tails with an exponent above but rather close to 1 . 
Thus, strictly speaking, the previous framework is inappropriate to describe such be¬ 
havior. Using the same asymptotics (T —>■ -|-oo), Jaisson [IT] studied the asymptotic 
limit of the correlation function of a 1-dimensional Hawkes process with a power-law 
kernel which decreases with an exponent 1 -I- 7 with 7 g] 0,1/2[. He proved that 
the auto-correlation function of the Hawkes process decreases asymptotically as a 
power-law with an exponent 1 — 27 , i.e., leading to a long-range dependency 


2.3.7 Clustering representation 


Another useful property of the Hawkes process is the clustering property which 
emerges as a consequence of the linearity of Eq. Q . Such property allows one to (i) 
build an efficient simulation algorithm for the process (see App. j^), (ii) introduce 
the notion of parenthood among different events (see Sec. 2.3.8 below), (Hi) infer 
parenthood relations among successive events from empirical data (see the paragraph 
about the EM method in App. C.2|. 


Proposition 6 (Clustering representation). Consider a positive integer D and a 
(non-necessarily finite) time interval [0,T], in which we define a sequence of events 
{(im,fcm)}m=i aeeording to the following procedure: 

• For each 1 < i < D, consider a set of immigrant events {(tm\ *)}m=i extracted 
with homogeneous Poissonian rate fp in the interval [0,T]. 


^For precise formulation of the corresponding theorem, we refer the reader to [42]. 

®Let us notice that qualitative arguments for similar result were also given in [7| in a 2-dimensional 
framework. 
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• For each immigrant event of type j, labeled by (t^i^j), and for each I < i < N, 

generate a sequence of first-generation events sampled with time- 

dependent Poissonian rate 4>^^(t — tm') in the interval [t^i,T]. 

• Iterate above rule from generation n—1 to generation n, so to obtain the event 

sequence {(tm \ , until no more events are generated in [0,T]. 

Then the union of all the events 

OO 

= \J{{tt\kt'^)}^':l (56) 

n—0 

corresponds to the one generated by the Hawkes process ^ in the time interval 
[0,T]. 

Note that the construction above (depicted in Fig. can be equivalently taken 
as a definition for the Hawkes process, once the information encoding the generation 
n is discarded by taking the union of all the events. Indeed, this richer definition 



Figure 5: Cluster representation of a Hawkes process: while the upper panel 
represents the branching structure of a bivariate Hawkes process, the lower panel 
shows its projection obtained by disregarding the cluster structure. The different 
components z S {1,2} are shown in different colors, while the connected structures 
in the upper panel denote three different clusters. 

characterizes more transparently the stationarity condition (H) : by considering T = 
OO in above construction, it is possible to map the branching structure of the Hawkes 
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process described above onto the one of a Galton-Watson tree with average offspring 
11$| |. The qualitative behavior of the model is in fact dictated by the average number 
of events generated by a parent event, equal to = (j){0) = ||<i>||. The three 

phases of the Hawkes process (stationary ||$|| < 1, non-stationary ||<i)|| > 1 and 
quasi-stationary ||<I>|| = 1) correspond then to the three phases of a Galton-Watson 
branching process, more precisely: 

1. For ||<i>|| < 1, we have a sub-critical phase in which each parent event generates 
on average less than one child event. This implies that the total progeny of 
each event is a.s. finite and the average number of generations before extinction 
is a.s. finite. 

2. For ||$|| > 1, we have a super-critical phase in which more than one child 
event is generated by each parent event. In that case the total progeny of a 
parent event might be infinite with finite probability. 

3. For ||$|| = 1 (the critical case), the total progeny is a.s. finite, but the total 
size of the progeny has large fluctuations leading to a divergence of the average 
number of generations before extinction. 

2.3.8 Causality 

The parenthood relation introduced by using the clustering representation of the 
Hawkes process allows us to discuss the problem of causality in this context. In 
particular, after constructing a Hawkes process according to the branching procedure 
described above, one can introduce the counting functions 

= (Exogenously generated events of type i) (57) 

= (Events of type i with type j direct ancestor) (58) 

_ (Events of type i with type j oldest ancestor) (59) 

so that -\- -b = NJ:. Hence, these quantities can 

be used in order to express the overall number of events of type i generated by an 
ancestor of a given type j. In particular, it is easy to prove that: 

Proposition 7 (Causality). For a stationary Hawkes process, the average incre¬ 
ments of and are expressed by 


E 

[dfvr°; 

/At = 


(60) 

E 

i 

1_1 

/At = 


(61) 

E 


/At = 


(62) 


The property above can be used in order to estimate the average fraction of 
events (directly or indirectly) caused by a specific component of a Hawkes process. 

Alternative notions of causality for point-processes have also been investigated. 
In particular, the notion of Granger-causality has been extended to point-processes 

in [5711^ . 
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3 Univariate models 


3.1 Models of market activity and risk 

The first straightforward application of Hawkes processes in high frequency finance 
is probably to model the so-called volatility clustering phenomenon. Since volatility 
at the transaction level can be directly related to the number Nt of a given type 
of events (trades, mid-price changes,...) that occur in a given time interval of size 
the self-exciting nature of Hawkes processes provides a very simple picture that 
can explain the correlated nature of volatility fluctuations. This idea was first pro¬ 
posed by Bowsher m who calibrated a univariate Hawkes model with mixture of 
exponential kernels using intraday equity data from NASDAQ and NYSE0 

In Ref. |2] , Bacry et al. recently introduced a non-parametric estimation method 
for multivariate symmetric Hawkes processes based on the spectral factorization of 
the covariance matrix by the means of the Hilbert transform. By calibrating a 1- 
dimensional Hawkes model to the occurrence of trades of the 10 years Euro-Bund 
future front contract over 75 trading days in 2009, they discovered two important 
empirical facts: (i) the model is very close to its stability threshold ||$|| = 1 and 
(ii) the empirical kernel <I>(t) is very well described, over a wide range of scales, by 
the power-law function Eq. (|^ 


m 


a/3 

{i + l3ty+'y 


(63) 


with 7 ~ 0. The first observation directly concerns the level of endogeneity and the 
one of stability of financial markets, a problem, as discussed in the next section, 
that has been addressed afterwards by Filimonov and Sornette or Hardiman et 
al. [HI Ea El EH- The power-law nature of Hawkes kernels with an exponent 7 ~ 0 
has been confirmed by studies that followed, notably by Hardiman et al. on mid¬ 
price changes of E-mini S&P500 futures [S] or by Bacry and Muzy on trades arrivals 
of EuroStoxx index futures [S]. The plots of Fig. are directly extracted from these 
papers: they represent in log-log scales the estimated Hawkes kernel for the E-mini 
SP futures mid-price change events and the EuroStoxx market order occurrences. 
One can see that the two estimated kernels, corresponding to different data, different 
markets and different estimation methods, are strikingly similar. This suggests some 
universality of both 7 and C = a/?”'’' parameters in the algebraic decay of Eq. (63). 

The origin of this power-law behavior and, in particular, of the values of a and 
7 remains an open question. Previous empirical results motivated the work by 
Jaisson m (see also Ref. [7] and the section preceding Eq. ([^), who proved that, 
within a particular asymptotics (which includes the norm of the kernel converging 
to 1), a 1-dimensional kernel with a power-law decay with an exponent 1 -I- 7 (with 
7 s]0, 0.5[) leads to an auto-covariance function of the flow which is power-law 
with an exponent 1 — 27 , i.e., to a long-range dependence of the flow (see also the 
discussion in Sec. 2.3 Eq. (341). The algebraic decay of Hawkes kernels can then 


®One can have in mind a simple price model where the price is build as the sum of independent 
random shocks. In this case, the volatility at scale T is proportional to . The empirical results of 

Ref. corroborate empirically this observation, allowing one to establish more precisely of the relation 
among impact per trade and volatility. 

^More precisely Bowsher considered a generalization of Hawkes processes in order to account for the 
peculiar non-stationarities observed at high frequency like intraday seasonalities and overnight gaps. 
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Kernel comparison 



Figure 6: Inferred kernel <i)(t) of the univariate Hawkes process describing mid¬ 
point changes of the E-mini S&P500 and for the trade arrivals of the EuroStoxx in 
different years, reproduced respectively from Refs. [5] and [34) . Strikingly, consid¬ 
ering the different markets, traded contract, type of events considered, and periods 
of time considered in these studies, the shape of $(t) is almost identical. 


be related to the volatility clustering properties. Notice that within trading time 
models the long-range nature of offer and demand has been mostly explained in 
terms of order splitting dynamics, and is supported by empirical results such as [HH] . 
in which broker-resolved data is analyzed in order to distinguish among the herding 
and splitting components of the order flow. It is thus likely that the observed slowly 
decreasing nature of the kernel directly results from the splitting of the meta-orders. 
Such an hypothesis remains however to be supported by quantitative arguments and 
to be confirmed by empirical observations. Beyond these fundamental questions, the 
power-law nature of Hawkes kernels remains a solid empirical fact which, at least, 
calls to question all the approaches based on exponential Hawkes models. 

In the same lines of the previously cited studies. Da Fonseca and Zaatour pT| . 
perform a parametric estimation (using a GMM approach, see Sec. C.l) of a 1- 
dimensional Hawkes process with exponential kernels of the form aj3 exp(—(3t) on 
(unsigned) market-order flow data. As expected, market-order clustering translates 
in a rather low value for /3 G [0.02,0.1] (depending on the financial time-series) and 
the norm of the kernel is found to be very close to criticality, i.e., a > 0.9 (and 
very often > 0.95). One can also cite the work of Lallouache and Challet who 
performed a maximum likelihood estimation on market orders using a sum of two 
exponential functions as the Hawkes kernel. They study forex EBS data which are 
throttled (market orders are gathered within slices of 0.1 seconds) which calls for a 
rather complex denoising preprocessing. Using goodness of fit tests, they conclude 
that, though the model performs well when applied to a 1 hour specific intraday 


21 








time (averaged every day over 3 months), the intraday seasonality of (mainly) the 
exogenous intensity /r does not allow to fit well a whole day. 

Hawkes processes have also been used to model extreme price moves at a rather 
low frequency. In Ref. [27] , Embrechts et al. study an equally weighted portfolio of 
3 indices (Dow Jones, Nasdaq and SP) on an hourly time-frame (on 14 years). Only 
extreme quantiles of returns are kept (the smallest and the largest 1% quantiles) 
leading to a 1-dimensional point process whose jumps correspond to an extreme 
return of any of the three indices. The jumps are then marked using a 3-dimensional 
marks coding the excess of each index respect to the corresponding 1% quantile. The 
so-obtained point process is modeled using a 3-dimensional Hawkes process (with 
an exponential kernel) marked by 3-dimensional Gamma-distributed i.i.d. random 
variables. A maximum likelihood estimation is performed and goodness of tests 
show that this model is well suited for modeling extreme price moves. Let us point 
out that, in the same work, a very similar experiment is performed on daily log- 
returns of an (home-made) index of stocks using this time a 2-dimensional Hawkes 
process (for coding positive or negative extreme jumps) with 1-dimensional marks. 
In the same spirit, in Ref. m, Chavez-Demoulin and Me Gill constructed a model 
for the excesses of an asset price above a given threshold. This model combines 
a Hawkes process for the excess occurrences with Pareto distributed marks to the 
excess sizes. Within this approach, the author’s goal was to describe the clustering 
of large drawdown events through the self-excited dynamics of the Hawkes process. 
By performing backtests on equities intraday data, they have shown that this model 
captures very well extreme intraday events, notably as compared to standard non- 
parametric methods based on extreme value theory. 


3.2 Measuring the endogeneity of stock markets 


In a recent series of papers [SIEIIMIES], some authors addressed, within the 
framework of Hawkes models, the important problem of the so-called ‘‘‘'volatility 
puzzle ", namely the fact that the observed market volatility cannot be explained 
by classical economic theory. Indeed, it is well known that prices move too much 
compared to the flow of pertinent information that may impact the market. This 
observation naturally leads to the idea that price dynamics is highly endogenous, i.e. 
mainly driven by some internal feedback mechanisms. Filimonov and Sornette m 
were the first to propose a quantitative measure of the level of “market reflexivity”. 
For that purpose, they model the high frequency mid-price variations of some stock 
index (namely the E-mini S&P500) as a 1-dimensional Hawkes process. As explained 
in Sec. 2.3.8 ||4>|| can be interpreted as a branching ratio, i.e., the number of events 
generated by any parent event. Each exogenous event occurring at rate fj, thus 
generates ||<I>||/(1— ||4)||) events and therefore the ratio of endogenous event rate to 


the overall rate A in one dimension is, according to Eq. (21), 


II^II 

A Vl- II4>I 


= ll$l 


This means that ||$|| provides a direct measure of the fraction of endogenous events 
within the whole population of mid-price changes and thus a measure of the mar¬ 
ket reflexivity. By analyzing the E-mini S&P500 future contracts over the period 
1998-2010, Filimonov and Sornette found that the degree of reflexivity has strikingly 
increased during the last decade. They suggested that this effect could be directly 
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caused by the increasing amount of high frequency and algorithmic trading, raising 
the question of the impact of high frequency trading on the market stability. This 
analysis has been revisited by Hardiman, Bercot and Bouchaud |34] who noticed 
that Filimonov and Sornette estimation relying on an exponential parametrization 
is biased because, as discussed previously, empirical evidences suggest that Hawkes 
kernels have a slow (power-law) decay. Accounting for this feature on their es¬ 
timation of 11 $11, Hardiman et al. found that the reflexivity of the E-mini S&P 
future hasn’t been increasing during the last decade, but has remained constant at 
a value very close to the critical one ||$|| = 1. It is noteworthy that Hardiman 
et al. also provided, in their study, empirical evidences of the existence of a high 
frequency cut-off of the Hawkes kernel (namely the parameter in Eq. (63)) that 


decreases exponentially fast in time and that can be associated with the increase of 
the trading frequency. In a more recent paper, Filimonov and Sornette [32] . have 
reviewed all the pitfalls associated with the estimation ||$|| in the case of a slowly 
decreasing kernel. They have shown that significant biases can be induced by the 
presence of outliers, edge effects or non stationary effects. Because some important 
issues are also related to the way one parametrizes the model (notably the choice of 
the high-frequency regularization), Hardiman and Bouchaud [3S] proposed a simple 
non-parametric approximation of the branching ratio ||$|| that relies on Eq. (23). 


Indeed, by considering this equation in z = 0, it is possible to relate the integral of 
the correlation function, c(0) to ||$||. The number of events Nx in a window of size 
r, becomes, for T large enough, c(0) ~ T~^Y [IVt] and therefore one gets 


"" \nNT]j 


1/2 


(64) 


This formula leads to a very intuitive interpretation of the degree of reflexivity: The 
occurrence of correlated events implies an increase of the variance of Nt with respect 
to its mean value (for a Poisson process, both quantities are equal so that one directly 
gets ||$|| =0). Using this model free estimator, Hardiman and Bouchaud [35] have 
confirmed their former claims that the S&P 500 future appears to have, during the 
last ten years, a stable level of reflexivity, close to the criticality. Let us notice 
that this formula only holds for I-dimensional Hawkes processes and has no simple 
extension in the multivariate situation. 

Beyond the debate on the most suitable estimator for the reflexivity parameter 
and its genuine behavior, the pioneering work of Eilimonov and Sornette provided 
a quantitative framework allowing to study the endogeneity of market fluctuations 
with Hawkes processes. They notably have shown that such approach can be used 
to study particular events such as the flash crashes of April and May 2010 |5T] . 
Their results may be helpful to devise warning tools in order to anticipate extreme 
drawdowns which are of endogenous origin. The prospects and applications along 
this path are numerous. One important question concerns the extension of such 
studies by accounting other types of events like e.g. order book events (see Sec. |^. 


4 Price models 

Describing the fluctuations of price at the finest time scales, with notably the goal 
of improving volatility and covariance estimations, is a central issue of financial 
econometrics. The notion of microstructure noise was considered by many authors 
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as an additional noise, superimposed to the standard diffusion, that accounts for the 
small scale behavior of the signature plot. Indeed, it is well know that the signature 
plot 

1 

i=0 

that corresponds to the quadratic variation of the mid-price Pt at scale r, strongly 
increases when t 0 (see Fig. [^. Along the same line, the so-called Epps effect 
accounts for the vanishing covariation among the returns of pairs of assets when the 
return scale r goes to zero. Bacry et al. in [3] proposed as an alternative to standard 
“latent price” models, to directly account for the discrete nature of price variations. 
They have been the first ones to describe the tick-by-tick variation of the mid-price, 
Pt, within the framework of Hawkes processes. In order to do so, these authors 
considered the two counting processes Nt and Nt associated with respectively the 
arrival times of upward and downward price changes and set: 

Pt = Po + - Nl t > 0 


where the couple {Nl,Nt) is a 2-dimensional Hawkes model. Since, in a first ap¬ 
proximation, the dynamics of the upward moves and of the downward moves are 
expected to be the same, it is natural to consider a matrix kernel $ as the one con¬ 
sidered in Eq. ([^ with equal diagonal terms and anti-diagonal terms 

It is well known that, at the microstructure level, the price is essentially mean re¬ 
verting, thus in [3], the authors considered the “purely mean-reverting scenario”, 
i.e., the case where = 0 and chose for an exponential shape. Within 

this simple framework, Bacry et al. provided a closed-form expression for the sig¬ 
nature plot. Calibrating the model using MLE or GMM estimations on Euro-Bund 
and Euro-Bobl future data, they have shown that the model is able to reproduce 
the scale behavior of the signature plot (see Fig. [^. Bacry et al. also considered 
a natural extension of the previous model to a 4-dimensional Hawkes model in or¬ 
der to describe the joint mid-price dynamics of a pair of assets and to reproduce 
the Epps effect [3]. Let us recall that, in Ref. [1], the authors established that 
under general conditions, the empirical covariation of a multivariate Hawkes pro¬ 
cess converges towards its expected value, that can be easily expressed in terms of 
the Hawkes covariance matrix c{t) as given by Eq. (231. This result allows one, 
within the Hawkes price model of Bacry et al, to provide analytical expressions for 
the signature plot, lead-lag behavior and the Epps effect in terms of Hawkes ma¬ 
trix kernel $ las]. Let us also mention that, along the line the of previous model, 
in m, the symmetric case was considered, i.e., the “purely trend following scenario”: 
(^(^) (t) = 0 and {t) exponential shape. The authors compared this scenario with 
the “purely mean-reverting scenario” when used for daily volatility estimation using 
diffusive formula (53). They showed that the mean-reverting (resp. trend-following) 
scenario underestimates (resp. over-estimates) the volatility and concluded that a 
“full” model with both cross and self terms is more realistic and should lead to better 
volatility estimation. Let us notice that the existence of non-negligible diagonal and 
anti-diagonal terms has been confirmed by non-parametric estimations performed 
thereafter in laBi- 

One also expects the involved kernels not to be exponential functions. Indeed, 


as shown in many works, (e.g., in Ref. m) and as discussed previously in Sec. |3.1 


empirical self-exciting kernels are closer to a power-law than to an exponential. It 
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Figure 7: Reproducing the mid-price behavior at the microstructural level using 
Bacry et al. 2-dimensional Hawkes model. The plots represent the Euro-Bund 
empirical signature plot as compared to its fit within the model of Bacry et al.. 
The figure has been reproduced from Ref. [3] 


is with that consideration in mind that Jaisson and Rosenbaum 


diffusive framework that we have already described in second part of Sec. 2.3.6 


developed the 
It is 


an alternative framework to the more classical Brownian motion diffusive framework 
developed in the first part of the same Section. In the second part of their paper, 
Jaisson and Rosenbaum used their framework to build a version of the 2-dimensional 
Hawkes model initially introduced by [3] that converges at large scales towards the 
Heston price model |39| which displays volatility clustering. This work can be seen 
as the very first step towards a “across scales” unified model that would fit both 
microstructure stylized facts of price (i.e., point process with strong mean reversion) 
and “diffusive” stylized facts (volatility clustering and multifractality). In that sense 
this is a very promising work. 

We can also mention a very interesting generalization of the model introduced by 
Bacry et al. in [3] . In |69j , Zheng et al. introduced a model for the coupled dynamics 
of the best bid and ask prices. It starts by coding each best price using the Bacry et 
al. model leading to a 4-dimensional price model. The main difficulty comes from 
the fact that one needs to encode in the model the fact that the ask price needs to lie 
strictly above the bid price. This is achieved through a spread point process whose 
dynamics is coupled with the dynamics of both best prices. It is used to measure the 
distance (in ticks) between the ask price and the bid price: the spread is increased 
(resp. decreased) by I each time either the ask (resp. bid) component jumps upward 
(resp. downward) or the bid (resp. ask) component jumps downward (resp. upward). 
Finally a non-linear term is introduced in the Hawkes model: the intensities of the 
downward (resp. upward) jumps of the ask (resp. bid) price are set to 0 as soon as 
the spread process is equal to 1. The authors of [35] developed a whole new rigorous 
framework for this constrained, non-linear Hawkes model in which they were able to 
establish several properties (including a diffusive limits). They perform maximum 
likelihood estimation on real data (using exponential kernels) and show that they 
were able to reproduce rather well the signature plot. 
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Let us finally cite the work of Fauth and Tudor in |3D] where the authors pro¬ 
posed to describe bid and ask prices of an asset (or a couple of assets) within the 
framework of marked multivariate Hawkes model. Motivated by the empirical ob¬ 
servation (performed on high-frequency Euro/USD and Euro/GPB EX rates from 
30-01-2012 to 10-03-2012) that, as the transaction volumes increase, the inter-trade 
durations decrease, the authors proposed to consider, in addition to an exponential 
Hawkes kernel, a multiplicative mark (the function x M Sec. 2.2.11 that corresponds 
to a power-law function of the volumes: = Cv''. Their model thus describes the 

events corresponding to an increase/decrease of the bid/ask as a four dimensional 
Hawkes process marked by transaction volumes. By adding suitable constraints in 
order to avoid infinite spread, they calibrated the model on EX rates data by a 
maximum likelihood approach. Eauth and Tudor have shown that their model is 
consistent with empirical data by reproducing the signature plots of the considered 
assets and the behavior of the high-frequency pair correlation function (Epps effect). 


5 Impact models 

5.1 Market impact modeling 

Market impact modeling is a longstanding problem in market microstructure lit¬ 
erature and is obviously of great interest both for theoreticians and practitioners 
(see e.g., [T3] for a recent review). While for the former market impact reflects the 
mechanism enforcing the efficiency of markets, allowing prices to reflect fundamental 
information, for the latter it represents a cost which needs to be carefully minimized 
when executing an order. Eor a trader, market impact induces extra costs per trans¬ 
action which needs to be added to the fees charged directly by the market, forcing 
him to split large orders and trade them incrementally in sequences of smaller child 
orders. Any of such sequences of orders is called a meta-order, and quantifying their 
effect on prices is at the heart of market-microstructure regulation discussions. 

The theory of market price formation and the relationship between the order flow 
and price changes has made significant progress during the last decade thanks to the 
increasing availability of intraday data |14j . Many empirical studies have provided 
evidence that the price impact has, to many respects, some universal properties 
and is the main source of price variations. This corroborates the picture of an 
“endogenous” nature of price fluctuations that contrasts with the classical scenario 
according to which an “exogenous” flow of information drives the prices towards a 
fondamental value [14]. 

If a meta-order is placed at time t = 0 and executed until time t = T, an 
associated market impact curve can be defined by a proxy of the price variation it 
directly or indirectly cause^ One generally distinguishes two phases: an increasing 
(concave) part during the execution of the meta-order (i.e., on the interval [0,T]), 
followed by a decaying (generally convex) resilient part. The existence of permanent 
impact, i.e., a non zero asymptotic value (for large time) of the market impact curve, 
is a central problem that remains under debate. The typical shape of a market 
impact curve is shown in Eig. that was was obtained by Bacry et al. in 0 by 
averaging empirical impact curves over a large database of broker meta-orders. Let 

®We focus here on the impact of meta-orders, rather than with the impact of individual orders, or 
with the one of trade imbalance, which have also received considerable interest in the literature. 
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US point out that this type of measure of market impact cannot be obtained using 
anonymous market data (i.e, one cannot easily identify the meta-order of a given 
agent). This, together with the extremely slow intensity of the price signal with 
respect to its statistical fluctuations, is the reason why empirical results in this 
respect have been obtained only in relatively recent years. 



Figure 8: Averaged empirical market impact curves (normalized in time) over a 
large database of broker meta-orders. The fit is performed using the impulsive HIM 
model. This figure has been reproduced from [^. 


Bacry and Muzy proposed recently a price impact model based on Hawkes pro¬ 
cesses [7]. They suggested to directly account for the joint dynamics of mid-price and 
the market order occurrences. More precisely the four dimensions of their Hawkes 
model correspond to the mid-price upward and downward jumps and the buying and 
selling market order flow (they did not account for the volume of the orders nor the 
price jump sizes). The matrix kernel $ can then be decomposed into four sub-blocks 
of size 2x2. The first one describes the self-excitement of the market order flow, 
the second one the self-excitement of the price, the third one the (market) impact 
of the trades on the price while the last one accounts for the feedback influence of 
price moves on market order flow intensity. By calibrating the model directly from 
anonymous high-frequency data (using the Wiener-Hopf non-parametric estimation 
technique described in App. [C?^ on the most liquid maturity of EuroStoxx and Euro- 
Bund future contracts over 800 trading days from 2009 to 2012), the authors of [7] 
were able to disentangle the self and cross excitation dynamics of mid-price changes 
from the impact of market orders. In particular, they have shown that the market 
impact block is mainly diagonal: buying (resp. selling) orders are mostly triggering 
upward (resp. downward) price moves. Moreover the shape of the diagonal impact 
function is very localized around t = 0. This means that a market order no longer 
directly impacts the price after a very short delay (i.e. less that 0.1s). They have 
also shown that the feedback sub-block is mostly anti-diagonal with slightly negative 
diagonal kernel functions (see Sec. 2.2.4| for a short discussion on negative kernels). 
That indicates that an upward (resp. downward) jump in the price tends to increase 
the intensity of the selling (resp. buying) market order flow and to decrease the 
intensity of the buying (resp. selling) market order flow. The shape of kernels in- 
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volved in the self-excitation of the market order flow or the price jumps confirms 
former estimations performed within lower dimensional models: long-range correla¬ 
tion of the signs of the trades and mainly long-range mean-reversion of the price. 
All these results were confirmed (using a database with a much higher precision in 
time) in the work of [3] (see Sec. 6.1). Bacry and Muzy established that, within their 
framework, it is possible to determine the entire impact profile of some meta-order 
that is built by accounting for the “bare” direct localized marked impact and by a 
“dressed” impact that involves mainly the trades power-law self-excitation. They 
provided analytical expressions for this curve and notably established in both the 
increasing and the decreasing phase of the impact function, its relationship with 
the power-law behavior of the self-exciting kernel of market order events. Using the 
previously described empirical findings, they have shown that one can recover the 
typical shape depicted in Fig. 

In [^ , an impact model based on the 2-dimensional Hawkes price model described 
in Examplej^of Sec. 2.1 (in which only mean-reversion influence has been kept, i.e., 

(t) = 0) has been introduced. It takes into account the impact of an exogenous 
(buying) meta-order strategy r(t), where rit) corresponds to the trading rate of the 
(buying) strategy per unit of time (so r{t)dt corresponds to the number of shares 
bought between time t and t + dt). More precisely, the so-obtained Hawkes Impact 
Model (HIM) writes 

Xl = fj, + ★ dN^ + * fi'i't) and = fi + * dN^ + * /(^i)j (65) 

where dA^^^ (resp. dNl) codes the upward (resp. downward) jumps of the price and 
f{r{t))dt (with /(O) = 0) codes the infinitesimal impact of a buy order of volume 
r(t)dt. The function / corresponds to the instantaneous impact function and 
and correspond respectively to the impact kernel and the cross-impact kernel 
(this latter describes the impact of a buying order on downward jumps, of course, 
we expect that << Following the empirical findings of [7], the 

impulsive-HIM model corresponds to the particular choice of an impulsive (very 
localized) impact kernel (j)^^\t) = S{t), i.e., a Dirac distribution. Moreover, as for 
the choice of the impulsive-HIM model considers that the market reacts to 
the newly arrived order as if it triggered an upward jump: cj)^^'>{t) = C ■ The 
constant C > 0 is a very intuitive parameter that quantifies the ratio of “contrarian” 
reaction (i.e. impact decay) and of the “herding” reaction (i.e. impact amplification). 
Analytical formula for the market impact curve were obtained and three cases of 
interest for C were distinguished in [5] : C = 0 corresponds to no contrarian reaction 
(strong permanent impact), (7=1 corresponds to a contrarian reaction as “strong” 
(in terms of the norm) as the herding one (no permanent impact) and finally C s]0,1[ 
corresponds to a contrarian reaction which is not zero but strictly smaller than the 
herding reaction. Fig. shows a fit of the empirical market impact curve using this 
model with a power-law microstructure kernel {(f>^^'^{t) ^ t~'^, when t —> -boo), in 
which case, the market impact curve is proven to be decaying to the permanent 
market impact value with a power-law 

Let us point out, that in a totally different framework, Jaisson |4Ij . linked, in 
the asymptotic limit defined by Jaisson and Rosenbaum in [32] (see Sec. 2.3.6), the 
power-law exponent of the self-excitement kernel involved in the market order flow 
and the power-law exponent of the market impact decay. He derived his results, 


®As for (s) and (c), (I) stands for impact and (x) for cross-impact. 
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within a 2-dimensional Hawkes model (with only self-excitement kernels) for the 
market order flow, from (i) a price martingale hypothesis and (ii) a linear market 
impact hypothesis. 


5.2 Optimal execution 


A natural application of price impact models is to define optimal liquidation strate¬ 
gies. Hewlett [10] was the first to address this problem using Hawkes models. He 
proposed to model the occurrence of buy and sell market orders on FX markets us¬ 
ing a bivariate exponential Hawkes process. He found that these events are mostly 
self-excited, the cross excitation intensity between buy and sell events being negligi¬ 


ble. Using Eq. (47), Hewlett determined the expected future trade imbalance that, 
within a linear price impact model, allows one to determine the expected future 
price returns and the associated risk. He then showed that this approach allows one 
to devise a liquidation strategy that maximizes a mean-variance utility function. 

Application of Hawkes model for optimal execution has also been considered 
more recently by Alfonsi and Blanc [T| who modelled the price process using a lin¬ 
ear impact of liquidity takers. More precisely, the price is decomposed a the sum of a 
“fundamental” price which variations are proportional to a fraction of the trade im¬ 
balance and a “transient” price that is moved by the remaining fraction of the trade 
imbalance but with a damping term that represents the market resiliency caused by 
the market makers behavior. Alfonsi and Blanc have provided explicit expressions 
of optimal liquidation strategy (i.e. the one with the minimum expected cost) when 
the flow of buy/sell market orders that impact the price is either a Poisson process 
or a 2-dimensional Hawkes process with a symmetric exponential kernel matrix. 
They notably show that price manipulation strategies (i.e. liquidation strategies 
with negative expected cost) always exist for a Poisson model while they can be ex¬ 
cluded in the Hawkes model provided its parameters meet some specihc conditions. 
According the these conditions, the self-excitation should exactly compensate the 
price resiliency so that resulting price is a martingale. The other condition leads 
to identify the fraction of endogenous orders within the Hawkes model with the 
proportion of market orders involved in the transient part of the price behavior. 


6 Orderbook models 

Modeling faithfully the occurrence of various type of orders in the order book with 
the aim of understanding the mechanisms at the origin of price formation, volatility 
and liquidity variations is probably the main challenge that the applications Hawkes 
processes in financial econometrics have to face. Although this goal is far from 
being reached, some authors have already tackled the problem and made significant 
progress on these issues. 

6.1 Level-I book models 

The Level-I book description concerns the events that exclusively occur at the best 
bid and best ask levels of the order book. Even if this approach discards most of the 
book information, since best bid or best ask values are directly related to the asset 
mid-price and market orders mostly impact the book at level-I, one can expect that 
this level of description is rich enough to capture most of the market features. The 
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study of Biais et al. E] confirms indeed that most of the activity of an order book 
takes place close to the best quotes, while Cont et al. da indicate that a substantial 
part of the dynamics of prices can be accounted for by the evolution best bid and 
best ask only. 

One of the first applications of Hawkes models to order-book modeling at level-I 
was performed by Large |46| who formalized the concept of book resiliency, namely 
the ability of the order book to replenish after being depleted by a large trade. Large 
suggested to quantify resiliency by the way large trades alter future intensities of 
order occurrences. For that purpose he introduced the response kernel 


G^^{t)dt = E 


dNi\To,dN^ = l 


E[diV;|j-o] . 


( 66 ) 


G'‘^ {t) simply describes the increase of the future conditional intensity of events of 
type i caused, directly or indirectly, by the the occurrence, at time < = 0 of an event 
of type j. In the case of a Hawkes process of kernel matrix $, Large proved that 
satisfies the integral equation in Eq. 0 defining 4'(t). In other words, the matrix 
dt (t) can be interpreted as the increase of the expected number of events after a lag 
t caused by the occurrence of some event at time 0. Large then considered order 
book data from LSE, modeled as a 10-dimensional Hawkes processes where the book 
events are classified according to whether they move or not the mid-price: market 
and limit orders that move the mid-prices (4 components if one distinguishes bid 
and ask), market and limit orders that leave the book unchanged (4 components) 
and cancel orders (2 components). The impact of “aggressive” orders on the rate of 
forthcoming events is then studied. The processed data consisted in LSE stock data 
(Barclays equity) timestamped at the resolution of Is during the 22 trading days of 
January 2002. Large used MLE estimation within the class of exponential kernels 
in order to estimate dt. His results allowed him to provide a “causal” interpretation 
(Large prefers the term “precipitation” than “cause”) of the main event occurrence 
in the book. He mainly found that aggressive limit orders are principally caused 
by aggressive market orders, measuring thereby the market resiliency in all his as¬ 
pects, magnitude, trade direction and characteristic time. Consistently with former 
studies. Large estimated that the studied stock value is resilient less than 40 % of 
cases and, when it is the case, the book replenishment occurs within a time frame 
of around 20s. He also shown that market order dynamics is mostly self-excited 
and correlated over a large time. Aggressive market orders are also triggered by 
aggressive limit orders as a consequence of the “race to liquidity”. 

A similar analysis of level-I order book data was recently conducted by Bacry 
et al. [6]. These authors made a slightly different categorization than Large and 
distinguished all book events (market, limit and cancel orders) that leave the mid¬ 
price unchanged (6 components accounting for bid and ask sides) from events that 
move the mid-price up or down (2 components). The dynamics of these event 
occurrence has then been modeled as a 8-dimensional Hawkes process. Unlike Large, 
Bacry et al. performed a non-parametric estimation of the matrix of kernels using 
the method described in Sec. [01 They considered book data time-stamped at 
a time resolution of 10“®s and the analysis has been performed up to lags of a 
few minutes. The event dynamics has thus been considered over a range of time 
scales close to eight decades. The main result reported by the authors is that the 
book event dynamics is mainly self-exciting, except for the mid-price changes for 
which cross-excitation effects are strongly dominating. This is illustrated in Fig. 
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where the values of the estimated kernel norms are reported using a color map: 
one can see that the resulting matrix is mainly diagonal except in the price sub¬ 
block, that is anti-diagonal. The observation that price changes events are mainly 
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Figure 9: Empirical determination of the matrix of Hawkes kernel norms in the 
8-dimensional model of level-I book events. P stands for mid-price change events, T 
for trade events, L for limit order events and C for cancel events. The superscripts 
(a) (6) indicates the direction, ask or bid of the events. This figure is reproduced 
from [^. 


triggered by price change events is in agreement of Large previously reported results. 
Moreover, Bacry et al. observations confirmed the previously reported empirical 
fact that this triggering effect is mostly anti-diagonal, i.e., present price changes 
impact future price changes in the opposite direction. Previous findings using low¬ 
dimensional model concerning the kernel shapes were also confirmed: the market is 
highly endogenous, whatever the type of event one considers and all the dominating 
kernels (the diagonal kernels for orders leaving the price unchanged and the anti¬ 
diagonal kernels for mid-price moves) are slowly decreasing, well described by a 
power-law behavior as in Eq. (63). The richness of the Hawkes model of [5] allowed 
the authors to account for a richer dynamical behavior than previous works and 
to describe and quantify the high-frequency influences between all types of events. 
They notably characterized the impact of price changes on the book event flow, a 
quantity that turns out to be very sensitive to the asset tick size (as estimated by 
the probability that the mid-price has to move). They also provided evidence of 
some inhibitory effects which result from negative values of some Hawkes kernels. 
For example it was observed that, for a large tick asset (like the Euro-Bund futures), 
an upward price move not only triggers forthcoming trades at bid but also inhibits 
trades on the ask side. 

Restricting previous Large model to market order flows at best bid and best ask. 
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Muni-Toke and Pomponio |56j studied the dynamics of trade-through orders. A 
market order is a trade-through if part of it is executed at the next best limit. Thus, 
for that purpose, they used the 2-dimensional Hawkes process (with exponential 


kernels) described by Eq. (531. Though very rare for some assets (e.g., Euro-Bund 
future), trade-through can be quite frequent on other contracts. For instance on 
the BNP stock finds an average of 400 trade-throughs per day. Parametric 
estimation using MLE on exponential kernels have been performed on Euronext 
stocks restricting intraday-time to 9h30 to llh30 am to avoid very strong intraday 
seasonality effects. Goodness of fit tests confirm that, as for the full market order 
flow, the main components of the 2-dimensional Hawkes kernel are the self-exciting 
kernel. 


6.2 Full order book model 

In [SS], Muni Toke generalized the zero-intelligence model introduced in [Bl] to 
the Hawkes framework. While in the latter model the authors built a full model 
for the order book in which all the involved flows (i.e., limit and market orders 
at any level) are independent pure Poisson processes. It is clearly a very rough 
approximation, market orders are known to be long-range dependent (see Sec. 3.11 


due to the splitting of large meta-orders. Moreover, one expect limit orders to be 
also highly auto-correlated as well as correlated with the market-order flow due to 
the fact market makers interact with market takers. In [55], a 2-agents model is 
introduced. It comprises the following agents: 

• A liquidity provider: 

— Arrivals of limit orders are modeled by using either a pure homogeneous 
Poisson process or a 1-dimensional Hawkes process with an exponential 
kernel (cancellations are treated as in the zero-intelligence model, i.e., 
each order has a life-time which is an i.i.d. exponential random variable 
using whose parameter is fixed a priori) 

— The price-levels of new limit orders are randomly chosen by first choosing 
the (bid or ask) side with probability 1/2 and then by sampling their 
values from a Student distribution 

• A liquidity taker: 

— Modeled using either a pure homogeneous Poisson process or a 1-dimensional 
Hawkes process with an exponential kernel. 

All the volumes of both limit and market orders are i.i.d. and exponentially dis¬ 
tributed variables. Hawkes processes use exponential kernels which are estimated 
parametrically using Maximum Likelihood Estimation. 

Not surprisingly, inter-trade times of market orders are shown to be much better 
modeled by a 1-dimensional Hawkes process (referred to as MM since the only 
Hawkes kernel involved is one which deals with the influence of Market orders on 
themselves) than by a pure Homogeneous Poisson process (HP). In the same way, 
inter-time of limit orders are shown to be much better modeled by a 1-dimensional 


Hawkes process (LL) than by HP. Along the same lines, the left panel of Fig. 10 shows 


that inter-time between a given market order and the first arrival time of a limit 
order after that market order is much better reproduced by a 2-dimensional Hawkes 
(referred to as MM-I-LL-I-LM) model with self-exciting kernels for both market (MM) 
and limit orders (LL) and a single cross-exciting kernel which corresponds to the 
influence of past market orders on future limit orders (LM). The figure shows that 


32 





33 




0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 

Time t Spread s 


Figure 10: (Left panel) Empirical density function of the distribution of the du¬ 
ration between a given market order and the first arrival of a limit order after that 
market order. The density function is displayed for empirical BNPP time-series, 
and for three different models that were fitted (using maximum likelihood) on these 
data, namely: a purely Homogeneous Poisson model (HP) for both market and limit 
orders, a model with a pure Poisson process for limit orders and a 1-dimensional 
Hawkes process for the market orders (MM), a model with two independent 1- 
dimensional Hawkes processes for limit and market orders (LL-I-MM) and finally 
this last model with a cross-exciting term which characterize the influence of past 
market orders on future limit orders (LM). (Right panel) Empirical density func¬ 
tion of the distribution of the bid-ask spread. The density function is displayed for 
empirical BNPP time-series, and for three different models that were fitted (using 
maximum likelihood) on these data, namely: a purely Homogeneous Poisson model 
(HP) for both market and limit orders, a model with a pure Poisson process for limit 
orders and a 1-dimensional Hawkes process for the market orders (MM), a model 
with two independent 1-dimensional Hawkes processes for limit and market orders 
(LL-I-MM) and finally this last model with a cross-exciting term which characterize 
the influence of past market orders on future limit orders (LM). These figures are 
reproduced from Ref. [55| . 




a model with two independent Hawkes models (MM+LL) with no cross-exciting 
kernels (LM) does not perform well. In |5S] , Muni-Toke claims that the other cross¬ 
exciting kernel (the one describing the influence of past limit orders on future market 
orders) is negligible. Thus the order book dynamics seems to be driven mainly by the 
liquidity taker agent rather than the liquidity provider, i.e., in a first approximation, 
the market makers strategy consists basically in reacting to liquidity takers, whereas 
liquidity takers take decisions independently from market makers. 

Finally the right panel of Fig, shows the so-obtained distribution of the bid-ask 
spread for empirical data, the pure Homogeneous Poisson (HP) model, the MM-I-LL 
Hawkes model and the MM-I-LL-I-LM model. Again, this latter model is the one 
which best fits the empirical data. 

Let us mention a theoretical work by Jedidi and Abergel [33] in which a Hawkes- 
based markovian framework for the whole order-book dynamics is studied. 


7 Other models 

7.1 Systemic risk models 

Hawkes models have been also be used at coarser time scales where asset prices 
are mainly diffusive. They can be used in mixed models in order, for instance, to 
account for jumps that occur over the diffusion process. This is the spirit of the 
model developed by Ait-Sahalia et al. [53] that proposed to describe the contagion 
of a crisis across all the world markets by superimposing a multivariate self-excited 
Hawkes process to a standard multivariate continuous diffusion model. According to 
this model, called be the authors, “Mutually exciting Jump-Diffusion”, the log-price 
vector satishes: 

dXt =fit + atdWt + ZtdNt (67) 

where Wt is a Z?-dimensional Brownian motion, at is a stochastic volatility and Nt is 
a D-variate Hawkes process that accounts for the self-excited nature of price jumps 
occurrence {Zt is a random variable accounting for the direction and the intensity of 
the jump). A GMM estimation procedure is proposed in [63] based on a closed-form 
expressions of some moments associated with the returns variations in the univariate 
and bivariate cases with exponential Hawkes kernels. This estimation has been 
applied to five international equity indices data associated with respectively US, 
Europe, Asian, Pacian and Latin America zones. The authors found that the jumps 
terms have signihcant self-excitation components and as far as the “contagion” effect 
is concerned, it seems that the US equities are the ones with the greatest influence 
on other markets. 

In Ref. [35] , Errais et al. proposed to model credit default events in a portfolio of 
securities as correlated point processes. More specifically they considered that the 
dynamics of such events is described by a marked Hawkes process with exponential 
kernels. A stated by Prop. in this case, the couple of processes (A, N) is a Markov 
process. Thanks to the Dynkin formula, the authors provided explicit expressions for 
the conditional distribution of both the marked and counting processes. This result 
was used to price portfolios if credit derivatives such as index and tranch swaps. 
Their model was then calibrated from index and swap data during September 2008 
that witnessed several credit default notably Lehman-Brothers default. The authors 
have shown that by, capturing the dependence of default events, their model provided 
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good fits of market data, unlike standard approaches that failed during this period. 
Errais et al. emphasized that marked Hawkes models with exponential kernels belong 
to the more general class of affine point processes introduced by Duffie et al. [26] , 
An affine point process involves a stochastic intensity vector which is a Markov 
process with drift, diffusion and jump terms. Errais et al. |29j have shown that their 
approach remains tractable within the class of affine point processes that provides 
a richer jump interaction structure. 

Let us also quote the recent similar work of Dassios and Zhao [23] who addressed 
the question of default risk modeling and contagion propagation within the frame¬ 
work of the so-called “Dynamic Contagion Process” that is an exponential marked 
Hawkes model but where the Poisson exogenous events (the “immigrants”) are re¬ 
placed by a shot-noise (constructed as the first generation of the same exponential 
Hawkes model with a different mark probability density). The authors have estab¬ 
lished the theoretical distributional properties of this new process (that remains a 
Markov process) and provided analytic expressions for the its probability generating 
function. As in Errais et al. |29| . they showed that their approach is particularly 
suitable for modeling the dependence structure of arriving events with dynamic 
contagion and proposed an application to credit risk. 


7.2 Accounting for news 

In |6T|, Rambaldi et al. used Hawkes processes for modeling the impact of news 
on the EBS (foreign exchange market) quotation time-series, consisting in a list of 
quotation timestamps. Let us note that, since EBS data are throttled (aggregated 
below a window of 0.1s), a randomization procedure has been used for homogeniza¬ 
tion. The model for the quotation arrival times is a 1-dimensional Hawkes model 
involving either a double exponential or a power-law (expressed as a sum of 15 expo¬ 
nentials so to have a convenient Maximum Likelihood Estimation procedure). The 
impact of the news (on the quotation time-series) can be seen as particular instances 
of localized non stationarities. Thus, Rambaldi et al. introduced an exogenous ker¬ 
nel (an exponential function) that accounts for the impact of a particular 

instance of a news, leading to the model 0 

At =/X+ </.* dA^t+ </>(”) (t-to), (68) 


where to is the time of occurrence of a particular news. Let us point out that 0*-"^ 
(where (n) stands for news) is allowed to have a non-causal component (i.e., cj)^'^\t) 
is non zero for t < 0) in order to account for anticipation effects. Estimation is 
performed (using regular Maximum Likelihood Estimation procedure) on a 3 hour 
period around the considered news. Though the results are very noisy, the authors 
showed that the model captures nicely both the amplitude and the time scale of 
the news effect. The distribution of the norm ||(/'*-"^|| for the different news has 
broad distribution (that goes beyond 1) that clearly reflects the diverse effects of 
news on the market. Moreover, using some proxies for quantifying how unexpected 
a particular news is, Rambaldi et al. clearly showed that the norm is not 

only related to the news impact but also to its degree of surprise. 


^^Formally, this model can be seen as a 1-dimensional version of the 2-dimensional price impact model 
of Eq. (651 in which f{rt) is replaced by the Dirac distribution centered at time to, 5(t — to). 
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7.3 High-dimensional models 

Modeling co-jumps. In [12], Bormetti et al. modeled the complex dynamics 
of the extreme returns in a basket of stocks. They started by elaborating a rather 
elaborate procedure for identifying extreme returns (referred to as “jumps” in the 
paper, corresponding to anomalous values) from 1-minute returns time-series. On 
a basket of IV = 20 Italian stocks they found up to 280 jumps on a single stock 
on the overall period (of 88 days) and up to a total of 505 jumps per day on all 
the stocks. They clearly established that some jumps arrive at the same time, i.e., 
within the same time-window. This prevents from modeling the overall process as an 
iV-dimensional Hawkes process: Within such a framework, co-jumps cannot occur 
without introducing an impulsive component inside the kernel d>. Bormetti et al. 
finally suggested the following model 

• A 1-dimensional Hawkes process is used for modeling the co-jumps arrival 
times. The kernel of the Hawkes process is chosen to be a sum of exponential 
functions. 

• Each time t a co-jump occurs, each stock i, independently one form each other, 
has a probability pi to jump. These probabilities are estimated empirically 
from real data. 

• For each stock i, a 1-dimensional Hawkes process is used to model the id¬ 
iosyncratic jumps of that stock, i.e., the jumps that are do not correspond to 
co-jumps. Again, the kernels of the Hawkes process is chosen to be a sum of 
exponential functions. 

In their paper, Bormetti et al. developed a precise procedure to perform estimation 
of this model which is shown to be quite robust. Moreover they showed that it is able 
to capture simultaneously the time clustering of jumps and the high synchronization 
of jumps across assets. 

Clustering with graph models. In [33], Linderman and Adams developed 
a probabilistic model that combines Hawkes processes with random graphs models, 
that they applied on S&PlOO data. Each component of the Hawkes kernel codes the 
changes (of more than 0.1%) in the last traded price of a given asset during a whole 
week. Thus a 100-dimensional point process with 182.037 events is obtained. The 
kernels are chosen as 

(69) 

where A is a random binary (0 or 1) valued matrix, W a random matrix with pos¬ 
itive entries and hglt) a parametric kernel (of parameters 0) such that J hg = 1 (a 
logistic normal density with two parameters is chosen). Intraday seasonality is mod¬ 
eled using an exogenous intensity of the form p = m + where is a constant 

matrix and 2 /(t) is univariate Gaussian process. The random graph model is used to 
reflect the probability of the different network structures through the prior distribu¬ 
tions of the matrices A and W. They basically depend on a latent distance which is 
chosen in imposing an overall sparsity (20%) and a characteristic distance scale. 
Thus each stock k corresponds to a latent coordinate in that is estimated 
through a fully-Bayesian, parallel inference algorithm. A figure is displayed where 
each stock is represented as a dot in the latent 2-dimensional space with a color 
coding corresponding to its corresponding sector (among six sectors). Linderman 
and Adams showed that some sectors, notably energy and financial, tend to cluster 
together, indicating an increased probability of interaction between stocks in the 
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same sector. Other sectors, such as consumer goods, are broadly distributed, sug¬ 
gesting that these stocks are less influenced by others in their sector. One can think 
that this approach will pave the way, with the framework of multivariate Hawkes 
processes, to very promising applications where one will process large amounts of 
data associated with a great number of interacting components/agents in order to 
shed new light on the complexity of financial markets. 

Finally, let us note that Mastromatteo and Marsili tried to overcome the 
problem of the estimation of a high-dimensional Hawkes process by mapping it onto 
a graphical (Ising) model, so to describe the clustering of trading times in a set 
stocks. They reconstructed the interaction network of the D = 100 most traded 
stocks of the NYSE during the year 2003 by using machine learning techniques, and 
reported an overall scaling of the weight matrix W oc D~^. Their study evidences 
the presence of a large collective market mode of the matrix A, driving the overall 
level of market activity close to the critical point. 

8 Concluding remarks 

Hawkes processes are extremely versatile processes that can be considered as the 
building blocks for modeling the occurrence of time-correlated discrete events, play¬ 
ing the same role that Auto-Regressive models have for describing continuous-valued 
signals. Within a relatively simple mathematical framework, they allow one to char¬ 
acterize precisely the interactions between different categories of events and to ac¬ 
count for their causal relations. From the statistical point of view, they can be 
generated with relatively simple simulation algorithms, while several efficient esti¬ 
mation methods exist to calibrate them. 

Hawkes processes have already proven to be very useful in many domains like, 
e.g., the modeling of earthquakes, neuronal or social network activity. Because they 
allow one to describe data at the resolution of individual events and to account 
for endogenous triggering, contagion and cross-excitation phenomena, they have 
naturally found applications in the field of high-frequency finance. In this paper, we 
have proposed an overview of many of such applications that concern a wide variety 
of problems. From the question of the level of endogeneity of financial markets to 
the modeling of order-book events, including impact, risk contagion modeling, the 
design of optimal execution strategies, we saw that Hawkes models have been used 
to address a large spectrum of issues. 

Far from being exhaustive, this review is necessarily a “snapshot” of a topic that 
is promised to a rapid evolution and growth. Many of the studies we mentioned can 
be considered as pioneering works that bode more important results and a deeper 
understanding of the market dynamics at the microstructural level. Among the 
promising routes that will be explored in forthcoming studies, some can be eas¬ 
ily anticipated. Since several market stylized facts at medium or large time scales 
seem to originate from the market microstructure, the micro-to-macro transition is 
an important issue. In that respect, understanding the long-time behavior of the 
Hawkes process and the instabilities that can emerge can be of great importance. 
One may hope not only to recover known models at larger times, but also to account 
for genuinely new phenomena like microstructural crashes. Beyond the Brownian 
limit which one expects in usual situations, elucidating the emerging properties of 
the Hawkes process close to the instability limit seems a promising area of research 
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due to the richness of the model in the vicinity of the critical point and also be¬ 
cause empirical data seem to indicate that markets operate close to this instability 
threshold. 

The study and the estimation of Hawkes model in the high-dimensional regime is 
also a challenging prospect since the complexity of financial markets results notably 
from the interaction between a very large number of components (market partici¬ 
pants, agents, assets, information fluxes,...). There have been many progresses made 
recently in that direction notably under the impetus of the community of big-data 
and the studies of viral diffusion across social networks. One can expect many in¬ 
teresting applications of theses approaches to high-frequency finance to be proposed 
in the next future. 

Appendices 

A Table of financial applications found within each 
discussed paper 

Each of the academic works discussed throughout our paper involving numerical 
experiments on financial data is listed in the following table. Apart from the column 
names that are explicit: D stands for dimension of the Hawkes model, T for the 
length of the historical data used to calibrate the model, dt for the time resolution 
of the data, d>(t) for the shape of the Hawkes kernel. A^-Exp stands for a sum of N 
exponentials, PL for “power-law ” and NP for “non-parametric”. 
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B Simulation 


There are two main frameworks for simulating Hawkes processes: an intensity-based 
framework and a cluster-based framework, depending on whether it makes direct 
use of the intensity regression Eq. ([^ or of the cluster representation of the Hawkes 
process (see Sec. 2.3.71. The most popular method being certainly the intensity- 
based thinning method initially introduced by Lewis |48| in the general context of 
non homogeneous Poisson processes, and modified later by Ogata [55] . 


Thinning. The thinning algorithm is an incremental procedure in which the suc¬ 
cessive jumping times are generated sequentially. It can be easily adapted to the 
multi-dimensional context and can be generalized to the marked case. In its simplest 
version (valid for decreasing kernels), given a starting time t it basically consists in 
sampling a candidate jumping time t -f At using an exponentially distributed ran¬ 
dom variable At with parameter ■ Then a random variable U is 

uniformly drawn in the interval [0, liU < the jump is rejected. 

If not rejected, the jump is assigned to the component i, where i is the largest in¬ 
dex satisfying U > Xt — ^t+At- Let us point out that the complexity of the 

algorithm can be substantially reduced in the case of exponential kernels. 


Time-change Another common intensity-based approach for simulating a non- 
homogeneous Poisson process Nt uses the following well known fact: if one defines 
the cumulative intensity function Ft = Xudu, then Np-i is an homogeneous 
Poisson process of intensity 1. Thus in order to perform simulation one needs to 
know how to simulate where tm+i — tm is exponentially distributed. 

This algorithm has been applied for instance in the case of exponential kernels in [24] . 
In fact, in this simpler case it is possible to invert analytically the function F. 

Cluster algorithm The cluster approach consists basically in simulating the 
branching structure described in Sec. |2.3.7[ Hence, the algorithm is sequential in 
the generations index n rather then in real time t. One can see for instance |54j 
which describes how one can perform simulation of marked Hawkes processes using 
the branching structure. A short survey and a more comprehensive list of references 
can be found in the recent work [24] . 


C Statistical inference 

C.l Parametric estimation 

Maximum Likelihood Estimation. The most commonly used technique for 
parametric estimation of Hawkes processes is the Maximum Likelihood Estimator 
(MLE), which has been first introduced by Ogata in (SD). The log-likelihood of a 
non-homogeneous, multi-dimensional Poisson defined as in Eq. ([^ reads 

D .T M 

log/:(/x,$) =-^ / dtA) + ^ logAj^™, (70) 

i=l do 
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where the couples denote respectively event times and components. 

Given a Hawkes kernel <i>e depending upon a set of parameters 0, it is possible to 


estimate it from Eq. (70) by solving the problem 


= argmaxlog£(/r, $e). 

(m,®) 


(71) 


In the case of a general Hawkes process, the computation of the likelihood (or its 
gradient) is of the order of 0{M‘^D) where M is the total number of jumps. The 
fact that it reduces to 0{MD) when the kernels are exponential functions is one of 
the major reasons why exponential kernels (or sum of exponential kernels) are so 
commonly used in a parametric framework nziisiiiMiiiiisiiisn]- 


EM based Estimation. In [57] , the authors use an Expectation Maximiza¬ 
tion (EM) based technique, an iterative procedure comprising the alternation of two 
steps of expectation (E) and maximization (M), in order to exploit the cluster rep¬ 
resentation of the Hawkes process described in Sec. |2.3.7[ Given a parametrization 
of the Hawkes kernel and a set of parameters (/r, 6), each iteration consists in: 

• E-step {p,9) —>■ {pmm'}- Estimating, for all ordered pair of jumps {tm,tm') 
(with tm < tra'), the probability Pmm' that, within the cluster framework 
described in Sec. [OtI tm is an ancestor of tm' ■ 

• M-step {pmm'} —>■ {fJ‘,9): Gomputing the model parameters {fJ.,9) from the 
probabilities {pmm'} estimated in the E-step. 

This algorithm converges very quickly when the product of the average intensity A 
and the characteristic timescale r of the support of is small, i.e.. At ^ 1, meaning 
that few jumps are detected in intervals of size r. 

Let us point out that General Method of Moments (GMM) can be also used [21], 
based, for instance, on auto-covariance analytical formula such as (291. 


High-dimensional estimation. All the previous techniques cannot be applied 
in a large dimensional context (e.g., D > 100) without substantial modifications. Eor 
instance, even in the case of “simple” exponential kernels, the number of parameters 
is of the order of D^, so that applications to contexts in which D is large become 
unfeasible due to overfitting and/or computational issues. In order to address the is¬ 
sue of parametric estimation in large dimensions, one needs to use algorithms which 
involve regularization. Within the last year, Hawkes processes in large dimension 
have been the subject of numerous academical papers (see for instance [7D]). Most 
of them, adapt more or less “classical” convex optimization techniques in the frame¬ 
work of Hawkes processes with exponential kernels of the form a '‘^Let us 


point out that, in all these papers, in order to get a convex log-likelihood (70), the 
parameters are a priori fixed (i.e., they are not estimated), and generally not 
chosen to depend on i or j. Thus, in that case, the kernel matrix <I> can be written 
in matrix form as d> = where a is the so-called (weighted) adjacency matrix 

a = OL describes the “connections” between the different components 

of the Hawkes process. In order to solve the so-obtained convex-optimization of 
the log-likelihood, penalizations terms on a are customarily introduced. They are 
generally of two sorts: an penalization that ensures sparsity and a trace-norm 
penalization that ensures low-rank. To the best of our knowledge, these types of al¬ 
gorithms have only been used once in the context of finance (see Sec. 7.3 about [H]). 
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Clearly, one can easily forecast that there will be, in the near future, a huge num¬ 
ber of new algorithms for parametric estimation in large dimensions, some of them, 
improving the state of the art. 


C.2 Non-parametric estimation 

Very few non-parametric estimators of the kernel matrix of a Hawkes process have 
appeared in the academic literature so far. We review them in the next section. 


EM based estimation. Historically, the first one [17], corresponds to a non- 
parametric version of the EM estimation algorithm described in the previous section. 
It is based on regularization (via penalization) of the method initially introduced 
by [50] in the framework of ETAS model for seismology. It has been developed for 
I-dimensional Hawkes processes. The maximum likelihood estimator is computed 
using the same two steps as in the parametric case. The E-step corresponds to 
estimating all the Pmm' probabilities and the M-step corresponds to estimating 
and fj, from these probabilities. In m, some numerical experiments on particular 
cases are performed successfully even when the exogenous intensity depends slowly 
on time: the whole function is estimated along with the kernel with a 
very good approximation. However, as explained in |S], this method has two main 
drawbacks: 

• The convergence speed of the EM algorithm drastically decreases when the 
decay speed of the kernel is low (e.g., power-law decaying kernel), 

• The probabilistic interpretation of the kernel values involved in the EM method 
prevents the kernels to have negative values (see Sec. 2.2.4). 


Contrast function based estimation. In a recent series of papers 155] . 
some authors proposed, within a rigorous statistical framework, a second approach 
for non-parametric estimation. It relies on the minimization of the so-called 
contrast function. Given a realization of a Hawkes process Nt on an interval [0,r], 
associated with the parameters (/i, 4>(t)), the estimation is based on minimizing the 
contrast function C{p, $): 


= argminC'(^,4>), (72) 

(mA) 


where 


and 


C(m,4>) 


If) 'Y' 'Y 


D „t 

j=i 


(73) 


(74) 


Let us point out that, minimizing the expectancy of the contrast function is equiv¬ 
alent to minimizing the error on the intensity process. Indeed, if J-t is the 
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information available up to time t , since E 


dN} \Tt = Xldt, one has 


D 


argminE[C(^, $)] = argmin^ (e [ $)^ ] - E 

(a*,*) (a^.*) i=i ^ 


D 


= argmin^E (A*(/r, $) - A*)^ 

(A^.^) i=l ^ 


) ( 75 ) 

(76) 


The minimum value (zero) is of course uniquely reached for $ = $ and fi = fl. 

In [33], the authors chose to decompose <!> on a finite dimensional-space (in 
practice, the space of the constant piece-wise functions) and to solve directly the 
minimization problem (72) in that space. For that purpose, in order to regularize 


the solution (they are essentially working with some applications in mind for which 
only a small amount of data is available, and for which the kernels are known to be 
well localized), they chose to penalize the minimization with a Lasso term (which is 
well known to induce sparsity in the kernels), i.e., the norm of the components 
of $. Let us point out, that minimizing the contrast function and minimizing the 
expectancy of the contrast function are two different stories. The contrast function 
is stochastic, and nothing guarantees that the associated linear equation is not ill- 
conditioned. In |33j . the authors prove that, under certain conditions on $, the 
linear equation is invertible, i.e., the associated random Gram matrix is almost 
surely positive definite. In practice (they study real signals from neurobiology), 
they choose the components of to be piece-wise constant. 

The Wiener-Hopf approach. Let us point out that, since A) is expressed lin¬ 


early in terms of/r* and oitYie minimizing the error (75l is equivalent 


to solving a linear equation, which is nothing but the Wiener Hopf Eq. (38). Conse¬ 


quently, minimizing the expectancy of the contrast function is equivalent to solving 
the Wiener-Hopf equation. In Cl!!], the authors chose to perform non-parametric 
estimation by directly inverting this system. They have proved that, as long as 


g{t) corresponds to a conditional intensity of a Hawkes process, (38) has a unique 
solution in y(t) which is the kernel matrix $(t). The algorithm uses quadrature 
technique in order to discretize the system. It can be summarized in the following 
way: 

• Estimation of the vector A (simply using as an estimator of A® the number of 
jumps of the realization of A/® divided by the overall time realization) 

• Non-parametric estimation of the matrix g(t) defined by (36) (using kernel 
density estimation techniques) 

• Fix the number of quadrature points (e.g., gaussian quadrature) to be used as 
well as the support for all the kernel functions. 

• Discretize the Wiener-Hopf system on the quadrature points and inverse it. 
This leads to an estimation of the kernel functions on the quadrature points. 
Estimation on a finer grid as well as norm can be obtained through simple 
quadrature formula. 

• Estimation of the exogenous intensity g using 

We refer the reader to [8| where the full algorithm (including the methodology for 
choosing the bandwidth value for density kernel estimation as well as the number of 
quadrature points) is described precisely along with the comparisons with the other 
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methods presented above. It has actually been improved in [5] to take particular 
care of slowly decreasing kernels. 

As compared to the approach developed in [33] , this algorithm is better adapted 
to the case in which a large amount of data is available. As compared to the EM 
approach, we shall advocate its use over the EM algorithm mainly in two cases 
(which are often satisfied when dealing with financial data): 

• Either the kernel functions are not localized (e.g., power-law). Indeed, in that 
case the EM algorithm is known to be very slow to converge (see |47| and 0 ). 

• Or some of the kernel functions have negative values (see second point on 
the EM algorithm above). We refer the reader to [33] for justification why 
Wiener-Hopf approach allows to deal with negative kernels. 

Einally, let us point out that, in the particular case the kernel matrix is known to 
be symmetric (which is always true if the dimension Z) = 1), the method developed 
in [2| uses a spectral method for inverting (231 and deduces an estimation from the 
second-order statistics. It can be seen as a particularly elegant way of solving the 
Wiener-Hopf equation. 
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